Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
J Am Med Inform Assoc ; 30(7): 1305-1312, 2023 06 20.
Article in English | MEDLINE | ID: covidwho-2325541

ABSTRACT

Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID. Supported by RECOVER, N3C and NIH's All of Us study partnered to reproduce the output of N3C's trained model in the All of Us data enclave, demonstrating model extensibility in multiple environments. This case study in ML-based phenotype reuse illustrates how open-source software best practices and cross-site collaboration can de-black-box phenotyping algorithms, prevent unnecessary rework, and promote open science in informatics.


Subject(s)
Boxing , COVID-19 , Population Health , Humans , Electronic Health Records , Post-Acute COVID-19 Syndrome , Reproducibility of Results , Machine Learning , Phenotype
2.
Nat Commun ; 14(1): 2914, 2023 05 22.
Article in English | MEDLINE | ID: covidwho-2322120

ABSTRACT

Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID-a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)-to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients' data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history.


Subject(s)
COVID-19 , Post-Acute COVID-19 Syndrome , United States/epidemiology , Humans , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines , Cohort Studies , SARS-CoV-2 , Vaccination
3.
Clin Infect Dis ; 2023 May 19.
Article in English | MEDLINE | ID: covidwho-2327178

ABSTRACT

BACKGROUND: Identifying individuals with a higher risk of developing severe COVID-19 outcomes will inform targeted or more intensive clinical monitoring and management. To date, there is mixed evidence regarding the impact of pre-existing autoimmune disease (AID) diagnosis and/or immunosuppressant (IS) exposure on developing severe COVID-19 outcomes. METHODS: A retrospective cohort of adults diagnosed with COVID-19 was created in the National COVID Cohort Collaborative enclave. Two outcomes, life-threatening disease, and hospitalization were evaluated by using logistic regression models with and without adjustment for demographics and comorbidities. RESULTS: Of the 2,453,799 adults diagnosed with COVID-19, 191,520 (7.81%) had a pre-existing AID diagnosis and 278,095 (11.33%) had a pre-existing IS exposure. Logistic regression models adjusted for demographics and comorbidities demonstrated that individuals with a pre-existing AID (OR = 1.13, 95% CI 1.09 - 1.17; P< 0.001), IS (OR= 1.27, 95% CI 1.24 - 1.30; P< 0.001), or both (OR = 1.35, 95% CI 1.29 - 1.40; P< 0.001) were more likely to have a life-threatening COVID-19 disease. These results were consistent when evaluating hospitalization. A sensitivity analysis evaluating specific IS revealed that TNF inhibitors were protective against life-threatening disease (OR = 0.80, 95% CI 0.66- 0.96; P=0.017) and hospitalization (OR = 0.80, 95% CI 0.73 - 0.89; P< 0.001). CONCLUSIONS: Patients with pre-existing AID, exposure to IS, or both are more likely to have a life-threatening disease or hospitalization. These patients may thus require tailored monitoring and preventative measures to minimize negative consequences of COVID-19.

4.
Sleep ; 2023 May 11.
Article in English | MEDLINE | ID: covidwho-2316915

ABSTRACT

STUDY OBJECTIVES: Obstructive sleep apnea (OSA) has been associated with more severe acute coronavirus disease-2019 (COVID-19) outcomes. We assessed OSA as a potential risk factor for Post-Acute Sequelae of SARS-CoV-2 (PASC). METHODS: We assessed the impact of preexisting OSA on the risk for probable PASC in adults and children using electronic health record data from multiple research networks. Three research networks within the REsearching COVID to Enhance Recovery initiative (PCORnet Adult, PCORnet Pediatric, and the National COVID Cohort Collaborative [N3C]) employed a harmonized analytic approach to examine the risk of probable PASC in COVID-19-positive patients with and without a diagnosis of OSA prior to pandemic onset. Unadjusted odds ratios (ORs) were calculated as well as ORs adjusted for age group, sex, race/ethnicity, hospitalization status, obesity, and preexisting comorbidities. RESULTS: Across networks, the unadjusted OR for probable PASC associated with a preexisting OSA diagnosis in adults and children ranged from 1.41 to 3.93. Adjusted analyses found an attenuated association that remained significant among adults only. Multiple sensitivity analyses with expanded inclusion criteria and covariates yielded results consistent with the primary analysis. CONCLUSIONS: Adults with preexisting OSA were found to have significantly elevated odds of probable PASC. This finding was consistent across data sources, approaches for identifying COVID-19-positive patients, and definitions of PASC. Patients with OSA may be at elevated risk for PASC after SARS-CoV-2 infection and should be monitored for post-acute sequelae.

5.
J Am Med Inform Assoc ; 30(6): 1125-1136, 2023 05 19.
Article in English | MEDLINE | ID: covidwho-2298624

ABSTRACT

OBJECTIVE: Clinical encounter data are heterogeneous and vary greatly from institution to institution. These problems of variance affect interpretability and usability of clinical encounter data for analysis. These problems are magnified when multisite electronic health record (EHR) data are networked together. This article presents a novel, generalizable method for resolving encounter heterogeneity for analysis by combining related atomic encounters into composite "macrovisits." MATERIALS AND METHODS: Encounters were composed of data from 75 partner sites harmonized to a common data model as part of the NIH Researching COVID to Enhance Recovery Initiative, a project of the National Covid Cohort Collaborative. Summary statistics were computed for overall and site-level data to assess issues and identify modifications. Two algorithms were developed to refine atomic encounters into cleaner, analyzable longitudinal clinical visits. RESULTS: Atomic inpatient encounters data were found to be widely disparate between sites in terms of length-of-stay (LOS) and numbers of OMOP CDM measurements per encounter. After aggregating encounters to macrovisits, LOS and measurement variance decreased. A subsequent algorithm to identify hospitalized macrovisits further reduced data variability. DISCUSSION: Encounters are a complex and heterogeneous component of EHR data and native data issues are not addressed by existing methods. These types of complex and poorly studied issues contribute to the difficulty of deriving value from EHR data, and these types of foundational, large-scale explorations, and developments are necessary to realize the full potential of modern real-world data. CONCLUSION: This article presents method developments to manipulate and resolve EHR encounter data issues in a generalizable way as a foundation for future research and analysis.


Subject(s)
COVID-19 , Electronic Health Records , Humans , Health Facilities , Algorithms , Length of Stay
6.
BMC Med ; 21(1): 58, 2023 02 16.
Article in English | MEDLINE | ID: covidwho-2276360

ABSTRACT

BACKGROUND: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of long COVID are still in flux, and the deployment of an ICD-10-CM code for long COVID in the USA took nearly 2 years after patients had begun to describe their condition. Here, we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." METHODS: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code (n = 33,782), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. RESULTS: We established the diagnoses most commonly co-occurring with U09.9 and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty and low unemployment. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. CONCLUSIONS: This work offers insight into potential subtypes and current practice patterns around long COVID and speaks to the existence of disparities in the diagnosis of patients with long COVID. This latter finding in particular requires further research and urgent remediation.


Subject(s)
COVID-19 , Post-Acute COVID-19 Syndrome , Humans , Female , International Classification of Diseases , Pandemics , COVID-19/diagnosis , COVID-19/epidemiology , SARS-CoV-2
7.
Lancet Digit Health ; 4(7): e532-e541, 2022 07.
Article in English | MEDLINE | ID: covidwho-1852294

ABSTRACT

BACKGROUND: Post-acute sequelae of SARS-CoV-2 infection, known as long COVID, have severely affected recovery from the COVID-19 pandemic for patients and society alike. Long COVID is characterised by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous definition. Studies of electronic health records are a crucial element of the US National Institutes of Health's RECOVER Initiative, which is addressing the urgent need to understand long COVID, identify treatments, and accurately identify who has it-the latter is the aim of this study. METHODS: Using the National COVID Cohort Collaborative's (N3C) electronic health record repository, we developed XGBoost machine learning models to identify potential patients with long COVID. We defined our base population (n=1 793 604) as any non-deceased adult patient (age ≥18 years) with either an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) from an inpatient or emergency visit, or a positive SARS-CoV-2 PCR or antigen test, and for whom at least 90 days have passed since COVID-19 index date. We examined demographics, health-care utilisation, diagnoses, and medications for 97 995 adults with COVID-19. We used data on these features and 597 patients from a long COVID clinic to train three machine learning models to identify potential long COVID among all patients with COVID-19, patients hospitalised with COVID-19, and patients who had COVID-19 but were not hospitalised. Feature importance was determined via Shapley values. We further validated the models on data from a fourth site. FINDINGS: Our models identified, with high accuracy, patients who potentially have long COVID, achieving areas under the receiver operator characteristic curve of 0·92 (all patients), 0·90 (hospitalised), and 0·85 (non-hospitalised). Important features, as defined by Shapley values, include rate of health-care utilisation, patient age, dyspnoea, and other diagnosis and medication information available within the electronic health record. INTERPRETATION: Patients identified by our models as potentially having long COVID can be interpreted as patients warranting care at a specialty clinic for long COVID, which is an essential proxy for long COVID diagnosis as its definition continues to evolve. We also achieve the urgent goal of identifying potential long COVID in patients for clinical trials. As more data sources are identified, our models can be retrained and tuned based on the needs of individual studies. FUNDING: US National Institutes of Health and National Center for Advancing Translational Sciences through the RECOVER Initiative.


Subject(s)
COVID-19 , Adolescent , Adult , COVID-19/complications , COVID-19/diagnosis , COVID-19/epidemiology , COVID-19 Testing , Humans , Machine Learning , Pandemics , SARS-CoV-2 , United States/epidemiology , Post-Acute COVID-19 Syndrome
8.
Virol J ; 19(1): 84, 2022 05 15.
Article in English | MEDLINE | ID: covidwho-1846850

ABSTRACT

BACKGROUND: Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community-acquired pneumonia. Observations shortly after the start of the COVID-19 pandemic in 2020 suggested that ibuprofen was associated with an increased risk of adverse events in COVID-19 patients, but subsequent observational studies failed to demonstrate increased risk and in one case showed reduced risk associated with NSAID use. METHODS: A 38-center retrospective cohort study was performed that leveraged the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative. A propensity-matched cohort of 19,746 COVID-19 inpatients was constructed by matching cases (treated with NSAIDs at the time of admission) and 19,746 controls (not treated) from 857,061 patients with COVID-19 available for analysis. The primary outcome of interest was COVID-19 severity in hospitalized patients, which was classified as: moderate, severe, or mortality/hospice. Secondary outcomes were acute kidney injury (AKI), extracorporeal membrane oxygenation (ECMO), invasive ventilation, and all-cause mortality at any time following COVID-19 diagnosis. RESULTS: Logistic regression showed that NSAID use was not associated with increased COVID-19 severity (OR: 0.57 95% CI: 0.53-0.61). Analysis of secondary outcomes using logistic regression showed that NSAID use was not associated with increased risk of all-cause mortality (OR 0.51 95% CI: 0.47-0.56), invasive ventilation (OR: 0.59 95% CI: 0.55-0.64), AKI (OR: 0.67 95% CI: 0.63-0.72), or ECMO (OR: 0.51 95% CI: 0.36-0.7). In contrast, the odds ratios indicate reduced risk of these outcomes, but our quantitative bias analysis showed E-values of between 1.9 and 3.3 for these associations, indicating that comparatively weak or moderate confounder associations could explain away the observed associations. CONCLUSIONS: Study interpretation is limited by the observational design. Recording of NSAID use may have been incomplete. Our study demonstrates that NSAID use is not associated with increased COVID-19 severity, all-cause mortality, invasive ventilation, AKI, or ECMO in COVID-19 inpatients. A conservative interpretation in light of the quantitative bias analysis is that there is no evidence that NSAID use is associated with risk of increased severity or the other measured outcomes. Our results confirm and extend analogous findings in previous observational studies using a large cohort of patients drawn from 38 centers in a nationally representative multicenter database.


Subject(s)
Acute Kidney Injury , COVID-19 , Anti-Inflammatory Agents, Non-Steroidal/adverse effects , COVID-19 Testing , Cohort Studies , Humans , Pandemics , Retrospective Studies
9.
J Am Med Inform Assoc ; 29(7): 1172-1182, 2022 06 14.
Article in English | MEDLINE | ID: covidwho-1795238

ABSTRACT

OBJECTIVE: The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. MATERIALS AND METHODS: The National COVID Cohort Collaborative (N3C) table of laboratory measurement data-over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test. RESULTS: Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors' records lacked units). DISCUSSION: The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference. CONCLUSION: The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.


Subject(s)
COVID-19 , Electronic Health Records , Cohort Studies , Data Collection , Humans
10.
JAMA Netw Open ; 5(2): e2143151, 2022 02 01.
Article in English | MEDLINE | ID: covidwho-1669321

ABSTRACT

Importance: Understanding of SARS-CoV-2 infection in US children has been limited by the lack of large, multicenter studies with granular data. Objective: To examine the characteristics, changes over time, outcomes, and severity risk factors of children with SARS-CoV-2 within the National COVID Cohort Collaborative (N3C). Design, Setting, and Participants: A prospective cohort study of encounters with end dates before September 24, 2021, was conducted at 56 N3C facilities throughout the US. Participants included children younger than 19 years at initial SARS-CoV-2 testing. Main Outcomes and Measures: Case incidence and severity over time, demographic and comorbidity severity risk factors, vital sign and laboratory trajectories, clinical outcomes, and acute COVID-19 vs multisystem inflammatory syndrome in children (MIS-C), and Delta vs pre-Delta variant differences for children with SARS-CoV-2. Results: A total of 1 068 410 children were tested for SARS-CoV-2 and 167 262 test results (15.6%) were positive (82 882 [49.6%] girls; median age, 11.9 [IQR, 6.0-16.1] years). Among the 10 245 children (6.1%) who were hospitalized, 1423 (13.9%) met the criteria for severe disease: mechanical ventilation (796 [7.8%]), vasopressor-inotropic support (868 [8.5%]), extracorporeal membrane oxygenation (42 [0.4%]), or death (131 [1.3%]). Male sex (odds ratio [OR], 1.37; 95% CI, 1.21-1.56), Black/African American race (OR, 1.25; 95% CI, 1.06-1.47), obesity (OR, 1.19; 95% CI, 1.01-1.41), and several pediatric complex chronic condition (PCCC) subcategories were associated with higher severity disease. Vital signs and many laboratory test values from the day of admission were predictive of peak disease severity. Variables associated with increased odds for MIS-C vs acute COVID-19 included male sex (OR, 1.59; 95% CI, 1.33-1.90), Black/African American race (OR, 1.44; 95% CI, 1.17-1.77), younger than 12 years (OR, 1.81; 95% CI, 1.51-2.18), obesity (OR, 1.76; 95% CI, 1.40-2.22), and not having a pediatric complex chronic condition (OR, 0.72; 95% CI, 0.65-0.80). The children with MIS-C had a more inflammatory laboratory profile and severe clinical phenotype, with higher rates of invasive ventilation (117 of 707 [16.5%] vs 514 of 8241 [6.2%]; P < .001) and need for vasoactive-inotropic support (191 of 707 [27.0%] vs 426 of 8241 [5.2%]; P < .001) compared with those who had acute COVID-19. Comparing children during the Delta vs pre-Delta eras, there was no significant change in hospitalization rate (1738 [6.0%] vs 8507 [6.2%]; P = .18) and lower odds for severe disease (179 [10.3%] vs 1242 [14.6%]) (decreased by a factor of 0.67; 95% CI, 0.57-0.79; P < .001). Conclusions and Relevance: In this cohort study of US children with SARS-CoV-2, there were observed differences in demographic characteristics, preexisting comorbidities, and initial vital sign and laboratory values between severity subgroups. Taken together, these results suggest that early identification of children likely to progress to severe disease could be achieved using readily available data elements from the day of admission. Further work is needed to translate this knowledge into improved outcomes.


Subject(s)
COVID-19/epidemiology , Adolescent , Age Distribution , COVID-19/complications , COVID-19/diagnosis , COVID-19/therapy , COVID-19/virology , Child , Child, Preschool , Comorbidity , Disease Progression , Early Diagnosis , Female , Humans , Infant , Male , Risk Factors , SARS-CoV-2 , Severity of Illness Index , Sociodemographic Factors , Systemic Inflammatory Response Syndrome/diagnosis , Systemic Inflammatory Response Syndrome/epidemiology , Systemic Inflammatory Response Syndrome/therapy , Systemic Inflammatory Response Syndrome/virology , United States/epidemiology , Vital Signs
11.
J Biomed Inform ; 127: 104002, 2022 03.
Article in English | MEDLINE | ID: covidwho-1639382

ABSTRACT

OBJECTIVE: The large-scale collection of observational data and digital technologies could help curb the COVID-19 pandemic. However, the coexistence of multiple Common Data Models (CDMs) and the lack of data extract, transform, and load (ETL) tool between different CDMs causes potential interoperability issue between different data systems. The objective of this study is to design, develop, and evaluate an ETL tool that transforms the PCORnet CDM format data into the OMOP CDM. METHODS: We developed an open-source ETL tool to facilitate the data conversion from the PCORnet CDM and the OMOP CDM. The ETL tool was evaluated using a dataset with 1000 patients randomly selected from the PCORnet CDM at Mayo Clinic. Information loss, data mapping accuracy, and gap analysis approaches were conducted to assess the performance of the ETL tool. We designed an experiment to conduct a real-world COVID-19 surveillance task to assess the feasibility of the ETL tool. We also assessed the capacity of the ETL tool for the COVID-19 data surveillance using data collection criteria of the MN EHR Consortium COVID-19 project. RESULTS: After the ETL process, all the records of 1000 patients from 18 PCORnet CDM tables were successfully transformed into 12 OMOP CDM tables. The information loss for all the concept mapping was less than 0.61%. The string mapping process for the unit concepts lost 2.84% records. Almost all the fields in the manual mapping process achieved 0% information loss, except the specialty concept mapping. Moreover, the mapping accuracy for all the fields were 100%. The COVID-19 surveillance task collected almost the same set of cases (99.3% overlaps) from the original PCORnet CDM and target OMOP CDM separately. Finally, all the data elements for MN EHR Consortium COVID-19 project could be captured from both the PCORnet CDM and the OMOP CDM. CONCLUSION: We demonstrated that our ETL tool could satisfy the data conversion requirements between the PCORnet CDM and the OMOP CDM. The outcome of the work would facilitate the data retrieval, communication, sharing, and analysis between different institutions for not only COVID-19 related project, but also other real-world evidence-based observational studies.


Subject(s)
COVID-19 , COVID-19/epidemiology , Databases, Factual , Electronic Health Records , Humans , Information Storage and Retrieval , Pandemics , SARS-CoV-2
12.
EBioMedicine ; 74: 103722, 2021 Dec.
Article in English | MEDLINE | ID: covidwho-1536517

ABSTRACT

BACKGROUND: Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 (PASC or "long COVID"), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations. Patient-led studies are of particular importance for understanding the natural history of COVID-19, but integration is hampered because they often use different terms to describe the same symptom or condition. This significant disparity in patient versus clinical characterization motivated the proposed ontological approach to specifying manifestations, which will improve capture and integration of future long COVID studies. METHODS: The Human Phenotype Ontology (HPO) is a widely used standard for exchange and analysis of phenotypic abnormalities in human disease but has not yet been applied to the analysis of COVID-19. FUNDING: We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to HPO terms. We present layperson synonyms and definitions that can be used to link patient self-report questionnaires to standard medical terminology. Long COVID clinical manifestations are not assessed consistently across studies, and most manifestations have been reported with a wide range of synonyms by different authors. Across at least 10 cohorts, authors reported 31 unique clinical features corresponding to HPO terms; the most commonly reported feature was Fatigue (median 45.1%) and the least commonly reported was Nausea (median 3.9%), but the reported percentages varied widely between studies. INTERPRETATION: Translating long COVID manifestations into computable HPO terms will improve analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared/pooled more effectively. Furthermore, mapping lay terminology to HPO will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, thereby improving the stratification, diagnosis, and treatment of long COVID. FUNDING: U24TR002306; UL1TR001439; P30AG024832; GBMF4552; R01HG010067; UL1TR002535; K23HL128909; UL1TR002389; K99GM145411.


Subject(s)
COVID-19/complications , COVID-19/pathology , COVID-19/diagnosis , Humans , SARS-CoV-2 , Post-Acute COVID-19 Syndrome
13.
J Am Med Inform Assoc ; 29(4): 609-618, 2022 03 15.
Article in English | MEDLINE | ID: covidwho-1443051

ABSTRACT

OBJECTIVE: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. MATERIALS AND METHODS: We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. RESULTS: Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. DISCUSSION: We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. CONCLUSION: By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.


Subject(s)
COVID-19 , Cohort Studies , Data Accuracy , Health Insurance Portability and Accountability Act , Humans , United States
14.
JAMA Netw Open ; 4(7): e2116901, 2021 07 01.
Article in English | MEDLINE | ID: covidwho-1306627

ABSTRACT

Importance: The National COVID Cohort Collaborative (N3C) is a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative COVID-19 cohort to date. This multicenter data set can support robust evidence-based development of predictive and diagnostic tools and inform clinical care and policy. Objectives: To evaluate COVID-19 severity and risk factors over time and assess the use of machine learning to predict clinical severity. Design, Setting, and Participants: In a retrospective cohort study of 1 926 526 US adults with SARS-CoV-2 infection (polymerase chain reaction >99% or antigen <1%) and adult patients without SARS-CoV-2 infection who served as controls from 34 medical centers nationwide between January 1, 2020, and December 7, 2020, patients were stratified using a World Health Organization COVID-19 severity scale and demographic characteristics. Differences between groups over time were evaluated using multivariable logistic regression. Random forest and XGBoost models were used to predict severe clinical course (death, discharge to hospice, invasive ventilatory support, or extracorporeal membrane oxygenation). Main Outcomes and Measures: Patient demographic characteristics and COVID-19 severity using the World Health Organization COVID-19 severity scale and differences between groups over time using multivariable logistic regression. Results: The cohort included 174 568 adults who tested positive for SARS-CoV-2 (mean [SD] age, 44.4 [18.6] years; 53.2% female) and 1 133 848 adult controls who tested negative for SARS-CoV-2 (mean [SD] age, 49.5 [19.2] years; 57.1% female). Of the 174 568 adults with SARS-CoV-2, 32 472 (18.6%) were hospitalized, and 6565 (20.2%) of those had a severe clinical course (invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice). Of the hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March to April 2020 to 8.6% in September to October 2020 (P = .002 for monthly trend). Using 64 inputs available on the first hospital day, this study predicted a severe clinical course using random forest and XGBoost models (area under the receiver operating curve = 0.87 for both) that were stable over time. The factor most strongly associated with clinical severity was pH; this result was consistent across machine learning methods. In a separate multivariable logistic regression model built for inference, age (odds ratio [OR], 1.03 per year; 95% CI, 1.03-1.04), male sex (OR, 1.60; 95% CI, 1.51-1.69), liver disease (OR, 1.20; 95% CI, 1.08-1.34), dementia (OR, 1.26; 95% CI, 1.13-1.41), African American (OR, 1.12; 95% CI, 1.05-1.20) and Asian (OR, 1.33; 95% CI, 1.12-1.57) race, and obesity (OR, 1.36; 95% CI, 1.27-1.46) were independently associated with higher clinical severity. Conclusions and Relevance: This cohort study found that COVID-19 mortality decreased over time during 2020 and that patient demographic characteristics and comorbidities were associated with higher clinical severity. The machine learning models accurately predicted ultimate clinical severity using commonly collected clinical data from the first 24 hours of a hospital admission.


Subject(s)
COVID-19 , Databases, Factual , Forecasting , Hospitalization , Models, Biological , Severity of Illness Index , Adult , Aged , Aged, 80 and over , COVID-19/ethnology , COVID-19/mortality , Comorbidity , Ethnicity , Extracorporeal Membrane Oxygenation , Female , Humans , Hydrogen-Ion Concentration , Male , Middle Aged , Pandemics , Respiration, Artificial , Retrospective Studies , Risk Factors , SARS-CoV-2 , United States , Young Adult
15.
J Clin Transl Sci ; 5(1): e110, 2021 Mar 16.
Article in English | MEDLINE | ID: covidwho-1269358

ABSTRACT

The recipients of NIH's Clinical and Translational Science Awards (CTSA) have worked for over a decade to build informatics infrastructure in support of clinical and translational research. This infrastructure has proved invaluable for supporting responses to the current COVID-19 pandemic through direct patient care, clinical decision support, training researchers and practitioners, as well as public health surveillance and clinical research to levels that could not have been accomplished without the years of ground-laying work by the CTSAs. In this paper, we provide a perspective on our COVID-19 work and present relevant results of a survey of CTSA sites to broaden our understanding of the key features of their informatics programs, the informatics-related challenges they have experienced under COVID-19, and some of the innovations and solutions they developed in response to the pandemic. Responses demonstrated increased reliance by healthcare providers and researchers on access to electronic health record (EHR) data, both for local needs and for sharing with other institutions and national consortia. The initial work of the CTSAs on data capture, standards, interchange, and sharing policies all contributed to solutions, best illustrated by the creation, in record time, of a national clinical data repository in the National COVID-19 Cohort Collaborative (N3C). The survey data support seven recommendations for areas of informatics and public health investment and further study to support clinical and translational research in the post-COVID-19 era.

16.
J Am Med Inform Assoc ; 28(3): 427-443, 2021 03 01.
Article in English | MEDLINE | ID: covidwho-719257

ABSTRACT

OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.


Subject(s)
COVID-19 , Data Science/organization & administration , Information Dissemination , Intersectoral Collaboration , Computer Security , Data Analysis , Ethics Committees, Research , Government Regulation , Humans , National Institutes of Health (U.S.) , United States
SELECTION OF CITATIONS
SEARCH DETAIL